91 research outputs found

    Epicure: Distilling Sequence Model Predictions into Patterns

    Full text link
    Most machine learning models predict a probability distribution over concrete outputs and struggle to accurately predict names over high entropy sequence distributions. Here, we explore finding abstract, high-precision patterns intrinsic to these predictions in order to make abstract predictions that usefully capture rare sequences. In this short paper, we present Epicure, a method that distils the predictions of a sequence model, such as the output of beam search, into simple patterns. Epicure maps a model's predictions into a lattice that represents increasingly more general patterns that subsume the concrete model predictions. On the tasks of predicting a descriptive name of a function given the source code of its body and detecting anomalous names given a function, we show that Epicure yields accurate naming patterns that match the ground truth more often compared to just the highest probability model prediction. For a false alarm rate of 10%, Epicure predicts patterns that match 61% more ground-truth names compared to the best model prediction, making Epicure well-suited for scenarios that require high precision

    Rete: Learning Namespace Representation for Program Repair

    Get PDF
    A key challenge of automated program repair is finding correct patches in the vast search space of candidate patches. Real-world programs define large namespaces of variables that considerably contributes to the search space explosion. Existing program repair approaches neglect information about the program namespace, which makes them inefficient and increases the chance of test-overfitting. We propose Rete, a new program repair technique, that learns project-independent information about program namespace and uses it to navigate the search space of patches. Rete uses a neural network to extract project-independent information about variable CDU chains, def-use chains augmented with control flow. Then, it ranks patches by jointly ranking variables and the patch templates into which the variables are inserted. We evaluated Rete on 142 bugs extracted from two datasets, ManyBugs and BugsInPy. Our experiments demonstrate that ReTe generates six new correct patches that fix bugs that previous tools did not repair, an improvement of 31% and 59% over the existing state of the art

    Modus: a Datalog dialect for building container images

    Get PDF
    Containers help share and deploy software by packaging it with all its dependencies. Tools, like Docker or Kubernetes, spawn containers from images as specified by a build system’s language, such as Dockerfile. A build system takes many parameters to build an image, including OS and application versions. These build parameters can interact: setting one can restrict another. Dockerfile lacks support for reifying and constraining these interactions, thus forcing developers to write a build script per workflow. As a result, developers have resorted to creating ad-hoc solutions such as templates or domain-specific frameworks that harm performance and complicate maintenance because they are verbose and mix languages. To address this problem, we introduce Modus, a Datalog dialect for building container images. Modus' key insight is that container definitions naturally map to proof trees of Horn clauses. In these trees, container configurations correspond to logical facts, build instructions correspond to logic rules, and the build tree is computed as the minimal proof of the Datalog query specifying the target image. Modus relies on Datalog’s expressivity to specify complex workflows with concision and facilitate automatic parallelisation. We evaluated Modus by porting build systems of six popular Docker Hub images to Modus. Modus reduced the code size by 20.1% compared to the used ad-hoc solutions, while imposing a negligible performance overhead, preserving the original image size and image efficiency. We also provide a detailed analysis of porting OpenJDK image build system to Modus
    • …
    corecore